Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
This work-in-progress research paper describes a study of different categorical data coding procedures for machine learning(ML) in engineering education. Often left out of methodology sections, preprocessing steps in data analysis can have important ramifications on project outcomes. In this study, we applied three different coding schemes (i.e., scalar conversion, one-hot encoding, and binary) for the categorical variable of Race across three different ML models (i.e., Neural Network, Random Forest, and Naive Bayes classifiers) looking at the four standard measures of ML classification models (i.e., accuracy, precision, recall, and F1-score). Results showed that, in general, the coding scheme did not affect predictive outcomes as much as ML model type did. However, one-hot encoding – the strategy of transforming a categorical variable with k possible values to k binary nodes, a common practice in educational research – does not work well with a Naive Bayes classifier model. Our results indicate that such sensitivity studies at the beginning of ML modeling projects are necessary. Future work includes performing a full range of sensitivity studies on our complete, grant-funded project dataset that has been collected, and publishing our findings.more » « less
-
Abstract The Cambro‐Ordovician interval marks a significant transition from extinction to bio‐diversification in deep time. However, the relationship of bio‐transition to volcanism, commonly characterized by mercury (Hg) systematics in sedimentary records, has not been examined. We present the first Cambro‐Ordovician Hg systematics from the Scandinavian Alum Shale. Our results show pronounced Furongian Hg enrichments, coupled with positive Δ199Hg, Δ200Hg, and Δ201Hg values and negative Δ204Hg values that we ascribe to atmospheric Hg transport over long‐distances, while Early Ordovician Hg anomalies, characterized by near‐zero mass‐independent isotope values, indicative of submarine source. Our findings are supported by two new proxies: molybdenum‐Hg and vanadium‐δ202Hg co‐variations, demonstrating Hg systematics were strongly influenced by changes in source and depositional conditions. Constrained by a synchronous atmospheric‐tectonic‐oceanic model, we hypothesize Furongian subaerial volcanism contributed to global extinction and oceanic anoxia, whereas Early Ordovician submarine volcanism concurrent with ocean water upwelling promoted the nascent bio‐diversification.more » « less
-
Summary Finding a suitable representation of multivariate data is fundamental in many scientific disciplines. Projection pursuit ( ) aims to extract interesting ‘non‐Gaussian’ features from multivariate data, and tends to be computationally intensive even when applied to data of low dimension. In high‐dimensional settings, a recent work (Bickel et al., 2018) on addresses asymptotic characterization and conjectures of the feasible projections as the dimension grows with sample size. To gain practical utility of and learn theoretical insights into in an integral way, data analytic tools needed to evaluate the behaviour of in high dimensions become increasingly desirable but are less explored in the literature. This paper focuses on developing computationally fast and effective approaches central to finite sample studies for (i) visualizing the feasibility of in extracting features from high‐dimensional data, as compared with alternative methods like and , and (ii) assessing the plausibility of in cases where asymptotic studies are lacking or unavailable, with the goal of better understanding the practicality, limitation and challenge of in the analysis of large data sets.more » « less
An official website of the United States government

Full Text Available